Nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose a primal dual alternating proximal gradient (PDAPG) algorithm and a primal dual proximal gradient (PDPG-L) algorithm for solving nonsmooth nonconvex-strongly concave and nonconvex-linear minimax problems with coupled linear constraints, respectively. The corresponding iteration complexity of the two algorithms are proved to be $\mathcal{O}\left( \varepsilon ^{-2} \right)$ and $\mathcal{O}\left( \varepsilon ^{-3} \right)$ to reach an $\varepsilon$-stationary point, respectively. To our knowledge, they are the first two algorithms with iteration complexity guarantee for solving the two classes of minimax problems.
translated by 谷歌翻译
最近,由于这些问题与一些新兴应用的相关性,最近有许多研究工作用于开发有效算法,以解决理论收敛的保证。在本文中,我们提出了一种统一的单环交替梯度投影(AGP)算法,用于求解平滑的非convex-(强烈)凹面和(强烈)凸出 - 非concave minimax问题。 AGP采用简单的梯度投影步骤来更新每次迭代时的原始变量和双变量。我们表明,它可以在$ \ MATHCAL {O} \ left(\ Varepsilon ^{ - 2} \ right)$(rep. $ \ Mathcal {O} \ left)中找到目标函数的$ \ VAREPSILON $ -STAIMATARY点。 (\ varepsilon ^{ - 4} \ right)$)$迭代,在nonconvex-strongly凹面(resp。nonconvex-concave)设置下。此外,获得目标函数的$ \ VAREPSILON $ -STAIMATARY的梯度复杂性由$ \ Mathcal {o} \ left(\ varepsilon ^{ - 2} \ right)界限O} \ left(\ varepsilon ^{ - 4} \ right)$在强烈的convex-nonconcave(resp。,convex-nonconcave)设置下。据我们所知,这是第一次开发出一种简单而统一的单环算法来解决非convex-(强烈)凹面和(强烈)凸出 - 非concave minimax问题。此外,在文献中从未获得过解决后者(强烈)凸线 - 非孔孔的最小问题的复杂性结果。数值结果表明所提出的AGP算法的效率。此外,我们通过提出块交替近端梯度(BAPG)算法来扩展AGP算法,以求解更通用的多块非块非conmooth nonmooth nonmooth noncovex-(强)凹面和(强烈)convex-nonconcave minimax问题。我们可以在这四个不同的设置下类似地建立所提出算法的梯度复杂性。
translated by 谷歌翻译
我们研究了离线模仿学习(IL)的问题,在该问题中,代理商旨在学习最佳的专家行为政策,而无需其他在线环境互动。取而代之的是,该代理来自次优行为的补充离线数据集。解决此问题的先前工作要么要求专家数据占据离线数据集的大部分比例,要么需要学习奖励功能并在以后执行离线加强学习(RL)。在本文中,我们旨在解决问题,而无需进行奖励学习和离线RL培训的其他步骤,当时示范包含大量次优数据。基于行为克隆(BC),我们引入了一个额外的歧视者,以区分专家和非专家数据。我们提出了一个合作框架,以增强这两个任务的学习,基于此框架,我们设计了一种新的IL算法,其中歧视者的输出是BC损失的权重。实验结果表明,与基线算法相比,我们提出的算法可获得更高的回报和更快的训练速度。
translated by 谷歌翻译
虚拟试验旨在在店内服装和参考人员图像的情况下产生光真实的拟合结果。现有的方法通常建立多阶段框架来分别处理衣服翘曲和身体混合,或严重依赖基于中间解析器的标签,这些标签可能嘈杂甚至不准确。为了解决上述挑战,我们通过开发一种新型的变形注意流(DAFLOF)提出了一个单阶段的尝试框架,该框架将可变形的注意方案应用于多流量估计。仅将姿势关键点作为指导,分别为参考人员和服装图像估计了自我和跨跨性别的注意力流。通过对多个流场进行采样,通过注意机制同时提取并合并了来自不同语义区域的特征级和像素级信息。它使衣服翘曲和身体合成,同时以端到端的方式导致照片真实的结果。在两个尝试数据集上进行的广泛实验表明,我们提出的方法在定性和定量上都能达到最先进的性能。此外,其他两个图像编辑任务上的其他实验说明了我们用于多视图合成和图像动画方法的多功能性。
translated by 谷歌翻译
跨域冷启动推荐是推荐系统越来越新兴的问题。现有的作品主要专注于解决跨域用户推荐或冷启动内容推荐。但是,当新域在早期发展时,它具有类似于源域的潜在用户,但互动较少。从源域中学习用户的偏好并将其转移到目标域中是至关重要的,特别是在具有有限用户反馈的新到达内容上。为了弥合这一差距,我们提出了一个自训练的跨域用户偏好学习(夫妻)框架,针对具有各种语义标签的冷启动推荐,例如视频的项目或视频类型。更具体地,我们考虑三个级别的偏好,包括用户历史,用户内容和用户组提供可靠的推荐。利用由域感知顺序模型表示的用户历史,将频率编码器应用于用于用户内容偏好学习的底层标记。然后,建议具有正交节点表示的分层存储器树以进一步概括域域的用户组偏好。整个框架以一种对比的方式更新,以先进先出(FIFO)队列获得更具独特的表示。两个数据集的广泛实验展示了用户和内容冷启动情况的夫妇效率。通过部署在线A / B一周测试,我们表明夫妇的点击率(CTR)优于淘宝应用程序的其他基线。现在该方法在线为跨域冷微视频推荐服务。
translated by 谷歌翻译
我们详细介绍了一种开发Stein方法的方法,该方法是针对Riemannian歧管$ \ Mathbf M $界定的概率度量界定整体指标的。我们的方法利用了$ \ mathbf m $扩散的生成器与目标不变度度量及其表征Stein运算符之间的关系。我们考虑了一对具有不同起点的扩散,并通过对两对之间的距离过程进行分析,得出了Stein因子,该因子将解决方案绑定到Stein方程及其衍生物。Stein因子包含曲率依赖性的术语,并减少到当前可用于$ \ Mathbb r^m $的因子,此外,暗示$ \ Mathbb r^m $的界限在$ \ Mathbf M $时保持有效
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译